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Abstract 

We consider the nonparametric estimation problem of time-dependent multivariate 
functions observed in a presence of additive cylindrical Gaussian white noise of a small 
intensity. We derive minimax lower bounds for the L^-risk in the proposed spatio-temporal 
model as the intensity goes to zero, when the underlying unknown response function is 
assumed to belong to a ball of appropriately constructed inhomogeneous time-dependent 
multivariate functions, motivated by practical applications. Furthermore, we propose both 
non-adaptive linear and adaptive non-linear wavelet estimators that are asymptotically 
optimal (in the minimax sense) in a wide range of the so-constructed balls of inhomogeneous 
time-dependent multivariate functions. The usefulness of the suggested adaptive nonlinear 
wavelet estimator is illustrated with the help of simulated and real-data examples. 
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1 Introduction 

The nonparametric estimation problem of high-dimensional objects has been considered in the 
literature over the last three decades. With the help of appropriate balls in function spaces, such 
as, Holder, Sobolev or Besov balls, that measure smoothness of the unknown underlying high- 
dimensional object, asymptotical (as the sample size goes to infinity) optimal properties (in the 
minimax sense) of various linear and non-linear estimators, such as, kernel, spline or wavelet 
estimators, have been obtained (see, e.g., |Wahba, 1990 , [Korostelev and Tsybakov, 1993| 



(regression setting) and [Klemela, 2009| (density setting), and the references therein). 

These optimality properties were studied by [Chow et al., 20dT| in the case of time- 
dependent multivariate response functions. By following a trend to derive theoretical properties. 
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[Chow et al., 200T] considered a "continuous-time" model for the estimation problem of time- 
dependent multivariate functions observed in a presence of additive cylindrical Gaussian white 
noise, that is, they considered 

dY,{t,x) = f{t,x)dx + edW{t,x), (1.1) 

where t £ T (T is a, compact subset of R) is the time variable, x£ X {X is a. compact subset of 
M'^, d > 1) is the space variable, / G L^(T x X) is the time-dependent multivariate function that 
we wish to estimate, dW{t,x) is a cylindrical orthogonal Gaussian random measure (representing 
additive noise in the measurements), and e > is a small level of noise, that may let be going to 
zero for studying asymptotic properties. 

A formal definition of a cylindrical orthogonal Gaussian random measure can be found in 
Section 2.1 of [Chow et al., 200T] . Moreover, we understand (jl.ip in a generalized sense, that 
is, the observable elements are treated as linear functionals, so that the process Y^{t,x), t E T, 
xG X, is correctly defined (see Section [6.ip . Also, without loss of generality, in the sequel, we 
assume that T = [0, 1] and X = [0, l]"'. 

Assume periodic assumptions in each argument of f{t,x), t £ T, x £ X . Consider Holder 
continuity in Lp'{X) on the derivatives of f{t,x) with respect to t G T, uniformly over x £ X, 
and Holder continuity in L'^{T) on the partial derivatives of f{t,x) with respect to the elements 
of X € X, uniformly over t €z T. Then, under known a-priori smoothness (i.e., knowing the 
involved Holder parameters) of f{t,x), [Chow et al., 2001] constructed a non-adaptive kernel- 
projection (linear) estimator and obtained an asymptotical (as e — )• 0) upper bound of its L^-risk 
(on X), uniformly over a set Ti C T, that depends on e and the involved smoothness parameters 
(see, [Chow et al., 200T] , Theorem 4.1). Moreover, they have showed that, asymptotically, this 
upper bound cannot be improved (see [Chow et al., 200T| , Lemma 5.3), thus establishing the 
asymptotical optimality (in the minimax sense) of their suggested estimator. 

Our aim is twofold. From a theoretical point of view, we extend the asymptotical 
optimal convergence rates derived in [Chow et al., 200T] . In particular, when smoothness is 
measured in appropriate balls of inhomogeneous functions, constructed with the help of tensor- 
product wavelet bases and Besov spaces, with or without a-priori knowledge of the involved 
smoothness parameters, we construct, respectively, non-adaptive linear (projection) or adaptive 
non-linear (block-thresholding) wavelet estimators that achieve the established asymptotical 
optimal convergence rates under the L^-risk. From a practical point of view, we demonstrate 
the usefulness of the suggested adaptive nonlinear wavelet thresholding estimator in practical 
applications. In particular, we show the superiority of the suggested estimator in terms of 
average mean squared error over pixel by pixel and slice by slice wavelet denoising estimators, 
both with universal thresholds. 

The paper is organized as follows. Section [2] provides a motivating example. Section [3] 
contains a brief summary of the tensor-product wavelet bases and standard Besov spaces while 
Section [4] discusses the function spaces that we consider to appropriately model the considered 
inhomogeneous time-dependent multivariate functions. Section [5] contains the minimax lower 
bounds for the L^-risk. Section [6] introduces both non-adaptive linear and adaptive non-linear 
wavelet estimators and provides their minimax upper bounds for the L^-risk in a wide range 
of the so-constructed balls of inhomogeneous time-dependent multivariate functions. Section 
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[7] demonstrates the usefulness of the suggested adaptive nonhnear wavelet estimator with the 
help of simulated and real-data examples. Section [8] contains some concluding remarks. Finally, 
Section [9] (Appendix) provides two technical lemmas that are used in the proofs of the main 
theoretical results. 



2 A motivating example 

Increasingly, scientific studies yield time-dependent d-dimensional images, in which the 
observed data consist of sets of curves recorded on the pixels of d-dimensional images 
observed at different times or wavelengths, see e.g. [Antoniadis et al., 2009| . Examples include 
temporal brain response intensities measured by functional magnetic resonance imaging (fMRI) 
[Whitcher et al., 2005 , satellite remote sensing images of landscapes |Ju et al., 2005| , and 



functional brain mapping using electroencephalography (EEG) and magnetoencephalography 
(MEG) |0u et al., 2009| . In many applications, the measured curves tend to be spiky and this 
requires flexible adaptive and local modeling of their variations. The high dimensionality and 
noise that characterize such time-dependent images makes difficult the estimation of the evolution 
of each pixel intensity over time (or wavelength) . 

We now discuss a specific application that motivates the estimation of / in model (jl.ip , and 
the choice of the function spaces that we use to measure the smoothness of / (see Section |4]) . 
An example of application and data fitting into model (jl.ip is satellite remote sensing imaging 
of landscapes, where the data are in the form of a multiband satellite 2-dimensional image of 
remote sensing measurements in various spectral bands of an area that contains roads, forests, 
vegetation, lakes and fields, see [Antoniadis et al., 2009J . As an illustrative example, we display 
in Figure [2?lT a) a typical temporal (or wavelength) slice (i.e., 2-dimensional grey-level image), 
and we plot in Figure [2?lT b) two curves (1-dimensional signals) corresponding to two selected 
pixels highlighted in blue and green in Figure IZTT a). With respect to model (II. ip . Figure I^TlT a) 
corresponds to a noisy version of x i— )• f{t,x) for some fixed t £ T, while Figure [2?lT b) corresponds 
to a noisy version of 1 1— )• /(t, x) for some fixed xGX. 



3 Wavelets and Besov spaces 

We briefiy consider tensor- product wavelet bases of L^(M'^), d > 1, and recall some of their 
properties; for a detailed description of their construction, we refer to [Mallat, 2009] . Assume 
that we have at our disposal a 1-dimensional scaling function (i.e., a father wavelet) (j) and a 1- 
dimensional wavelet function (i.e., a mother wavelet) ■0, both with compact supports. The scaling 
and wavelet functions of (j) and at scale j (i.e., at resolution level 2^) will be denoted by (j)x 
and ipx, respectively, where the index A summarizes both the usual scale and space parameters 
j and k. In other words, for d = 1, we set A = {j,k) and denote <i>j^k{') = 2-'/^(/)(2'' • —k)) and 
V^j,fc(') = 2^/'^'il){2^ ■ —k)). For d > 2, the notation ^jJx stands for the adaptation of scaling and 
wavelet functions to (see [Mallat, 2009| , Chapter 7). The notation |A| = j will be used to 
denote a wavelet at scale j, while |A| < j denotes a wavelet at scale f , with jo < j' < j, where 
jo denotes the coarse level of approximation (usually called the primary resolution level). With 
the above notation, we assume that 
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(a) (b) 

Figure 2.1: Satellite remote sensing image (64 x 64 pixels, over 128 wavelengths): (a) a 2- 
dimensional image measured at a specific wavelength; (b) evolution over wavelength of the 
intensities of the two pixels in green and blue marked in the image shown in (a). 

- the scaling functions (0A)|A|=j span a finite dimensional space Vj within a multiresolution 

hierarchy Vb C Vi C . . . C L^{R'^,dy), such that dim(y,) = 2^'^. 

- the scaling functions (</>A)|A|=j form an orthonormal basis of Vj and the wavelets (^/'A)|A|=j form 

an orthonormal basis of Wj (with Wj being the orthogonal complement of Vj into Vj'+i)- 

- Let 3^ be a compact subset of M.'^, d > 1. Assuming periodicity in each argument of y £ y, 

and using standard wavelet bases (d = 1) or tensor-product wavelet bases (d > 2) of L'^{y) 
(see, e.g. [Mallat, 2009| , Chapter 7), any / G L^(3^) can be decomposed as 

+00 

|A|=io i=io |A|=j 

where 

c\ = {f,4>x)y and f3\ = {f ,i^x)y. 

In order to simplify the notation, as it is commonly used, we write {ip\)\x\=jQ-i for 
{4>\)\x\=jQ, and, thus, / can be written in the compact form 

+ 00 

f{y) = E E "AV'A(y), y G 3^, 

3=30-1 \\\=j 

where ax denotes either the scaling coefficients cx or the wavelet coefficients (3x. 

Consider also the following balls of (inhomogeneous) Besov spaces. 

Let si > be a smoothness parameter in the domain (that is, y with (i > 2), and 
let 1 < pi,qi < +00. Let (V'A)|A|=j) 3 ^ ioi be the (periodic) d-dimensional (tensor-product) 
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compactly supported orthonormal wavelet basis of L^(3^i), with the convention that (V'A)|A|=jo-i 
denotes the scaling functions (<^A)|A|=jo- Assume that the 1-dimensional scaling function cp and 
the 1-dimensional wavelet function tp are ri-times continuously differentiable (regularity of the 
wavelet system {(f), ip)) with < si < ri, and assume that si + d{l/2 — 1/pi) > 0. Define the 
norm || • H^i^^^ by 



SI 

Pi ,91 



y=io-i \|A|=i 



with the respective above sums replace by maximum if = +00 and/or qi = +00. Then, the 
norm || • 11^^^^^ is equivalent to the traditional Besov norm (see e.g [Hardle et al., 1998| for further 
details), and one can thus define the following Besov ball of radius ^1 > 

Let S2 > be a smoothness parameter in the domain 3^2 (that is y with d = 1), and 
let 1 < P2,Q2 ^ +00. Let {ipm/)m=mo-i/=o,...,2"^-i be a (periodic) 1-dimensional compactly 
supported orthonormal wavelet basis of L^(3^2)) with the convention that {ipmo-i,e)e=o,...,2'^o-i 
denotes the scaling functions {4>mo,l)i=Q,...,2"^o-i, where mg is the coarse (primary) resolution 
level. Assume that the corresponding 1-dimensional scaling function (j) and the 1-dimensional 
wavelet function ^ are r2-times continuously differentiable (regularity of the wavelet system 
{(pjip)) with < S2 < T2, and assume that S2 + 1/2 — l/p2 > 0. Define the norm || • by 



+00 /2'"-i ^ 

,m=mo — 1 \ m=0 / 



g2/P2\ 1/92 



with the respective above sums replace by maximum if P2 = +00 and/or q2 = +00. Then, as 
noticed above, one can define the following Besov ball of radius ^2 > 

4 Smoothness assumptions on the time- dependent multivariate 
response function 

The statistical problem that we consider below is the estimation of the unknown time-dependent 
multivariate response function f{t,x), x S X, t E T, based on observations from model (ll.ip . 
Motivated by the practical application discussed in Section [2l in order to derive the asymptotical 
(as e — 7- 0) optimal (in the minimax sense) rates of convergence (for the L^-risk), we consider the 
following functional space to model f{t,x), xG X, t G T. 

First, let us assume that, for each t £ T, the mapping x 1— )• f{t,x) belongs to L^(Af). Let 
A = {A, |A| = j}j(,_i<j<+oo- For each t £ T, the (periodic) d-dimensional wavelet basis (V'A)AeA 
is used to decompose f{t,x) as 

+00 

fit,x)= E E'^^(*)^^(^) ^ith axit) = {fit,-),iJx)mx), x£X. (4.1) 

J=io-l |A|=j 
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Then, for each A G A, we assume that the mapping t ocx{t) belongs to L^(r). For each A G A, 
the (periodic) 1-dimensional wavelet basis {'(pm,e)m=mo-i,e=o,...,2"^~i is used to decompose ax{t) 
as 

+00 2™-l 

= J2 ^ ^\m,e'4'm,e{*) with ax,m,e = {otX,'4'm,e)TL'2{T), t & T. (4.2) 

m=mo — l £=0 

Finally, by assuming that the mapping {t,x) 1-^ f{t,x) belongs to ]L^{T x X) for any t £ T 
and X E X, and consider the corresponding tensor product wavelet basis, f{t,x} can thus be 
decomposed as 

+00 +00 2'"-l 

f{t,x)= 12 Yl Y ^\m,eii^m,e{t)Mx), teT, xeX. 

j=jo-l \X\=j m=mo-l i=0 



We are now ready to introduce the following definition in order to characterize the smoothness 
of the time-dependent multivariate function f{t,x), t E T, x E X . 

Definition 1. Let Ai > and A2 > be constants. Let si > be a smoothness parameter in 
space domain X and S2 > be a smoothness parameter in time domain T , such that < si < ri 

and < S2 < T2, where ti and T2 are the regularity parameters of the wavelet systems {(f>, tp) and 
{(pjil^), respectively. Letl <pi,qi < +00, 1 <P2,Q2 < +00, and assume that si+d{l/2—l/p) > 
and S2 + l/2 — l/p2 > 0. Letp={pi,p2) and q = {qi,q2). Define 'Bp^jj^^{Ai,A2) as the following 
ball of functions in LP(T x X): 

B;i/^(^i,^2) = {/ e L2(r X X) I sup{||/(i, -Wpl^J < Ai and \\cxx\\;ig, < Ax for all A G a| , 

where, for each t 



+00 



for each A G A, 



+00 /2™-l \ 52/P2\ 

>m=mo-l \ e=0 / 



with 



ax,m,e= fit,x)'4'm,e{i)ipx{x)dtdx, 

JTxX 

and {Ax)xel^ is a a set of positive constants such that 



+00 

E E^A<^i- (4.3) 
j=jo-i \\\=3 
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Assuming that f{t,x) £ Bp^^^^(^i, ^2) means that the muhivariate function f{t,-) belong 
to Bp^^^_^(Ai), uniformly over t £ T. This assumption also means that the smoothness of the 
wavelet coefficients {axi'))xGA over time t £ T is measured by the parameter S2 through a 
Besov ball Bp'^ g^(A\) whose radius satisfies equation (I4.3p . It implies that sup;^g^ {^a} ^ ^2 
and, more importantly, that limj^_^_^^^x\=j ^\ = so that the Besov norm ||f>:A||p2<j2 8°^^ ^o 
zero as the resolution level of the time-dependent wavelet coefficients cx\{-) goes to infinity. 
In practical applications, it correspond to the assumption that the high-resolution energy of a 
time- dependent multivariate function, when integrated over time, is going to zero. (In order to 
simplify the notation, we have dropped the dependence of Bp|^^^(Ai, ^2) on (^A)AeA-) 

To motivate the definition of the functional space Bp^^*^(^i, ^2)) let us consider the real- 
data example on satellite remote sensing data discussed in Section [2j For this time-dependent 
2-dimensional image, we display in Figure 14.21 the curve t 1— )• ax{t) for two types of wavelet 
coefficients, one at a low resolution level (|A| =3) and another one at the highest resolution level 
(|A| = 5). Clearly, the curve at the highest resolution level has a smallest amplitude which is 
consistent with the decay of ^a |A| increases in the definition of Bp^g^{Ai,A2)- Moreover, 
due the shape of the curves in Figure it seems reasonable to assume that the functions cka(") 
have the same degree of smoothness S2 across different resolution levels. 




(a) (b) 



Figure 4.2: Satellite remote sensing image. Evolution of the curve t 1— )■ cxx^t) for (a) a wavelet 
coefficient at resolution level |A| = 3, and (b) a wavelet coefficient at resolution level |A| = 5. 



In order to derive the minimax results, we define the minimax L^-risk over the class of balls 
B;i/H^i,^2) as 

7^,(B;^•^^^l,^2)) := inf sup 

= inf sup E\\f,-ff 
h feB;]piAr,A2) 



inf sup 



xX 



feit,x)- f{t,x) 



dtdx 
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where ||g|| is the L^-norm of a function g defined on T x X and the infimum is taken over all 
possible estimators (i.e., measurable functions) of /, based on observations from model (|l.ip . 

To present our results, for any d > 1 and any si > and S2 > 0, we define s > to be such 
that 

' ' '■^ + iV (4.4) 



S d + I \Sl S2 

In what follows, we use the symbol C for a generic positive constant, independent of e, which 
may take different values at different places. Moreover, in order to simplify the presentation of 
the results, and without loss of generality, we assume below that jo = mo = 0. 



5 Minimax lower bound for the L^-risk 

The following statement provides the minimax lower bounds for the L^-risk. 

Theorem 1. Let Ai > and A2 > be constants. Let {Ax)\^\ be a set of positive constants 
satisfying (^7^, and assume that there exists a positive constant A > such that, for any 
— 1 < J < +00 and |A| = j, 

Ax2l^^+^^ > A. (5.1) 

Let si > and S2 > be the smoothness parameters in the space and time domains, respectively, 
such that < si < Ti and < S2 < T2, where ti and T2 are the regularity parameters of the 
wavelet systems {4>,ip) and {(f), ip), respectively. Assume that 1 < pi,qi < +00, 1 < P2,Q2 ^ +00 
such that si + d(l/2 — 1/pi) > and S2 + 1/2 — l/p2 > 0, and let s > satisfy (j4.4p . Then, 
there exists a constant C > such that 

n,{B;];;^{Ai,A2))>Ce^^, 

for all sufficiently small e > 0. 

Proof. The proof is based on the standard Assouad's cube technique (see, e.g., [Tsybakov, 2009 , 
Chapter 2, Section 2.7.2). Consider the following test functions 

ji ma 2™-! 

/«,(i,^) = /"ii,m2 X] X] X] '^\rn,ii'mA^)^^i^)^ t ^ T, XGX, 

j=-l |A|=jm=-l e=0 

where w = ((w^A,m,^)|A|=j,0<^<2--i)^-=_i_ € n := {1,-1}^''"'^"'^ and fij^^rn^ is a 

positive sequence of reals satisfying the condition 

fij,,m2 = c2"5(™2+i)2-i(ji+i) min (2-(^^+^>\2-'''^^+^>^^ , (5.2) 

for some constant c > not depending on ji and m2. Assume that c satisfies the condition 

c<min(^i/(||^_i|U+i^||V^||oo),^) , (5.3) 
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where A is the constant satisfying inequahty (|5.1|) and if is a constant that is proportional to 
the length support oiip. Then, it easily follows that £ Bp^^*^(Ai, ^2) for any w £ Q. Indeed, 
for any t £ T, 



pi,qi 




( ( 

y=-i \iAi=j 

where 

/ ma 2™-l \ 

K/^(t,-),V'A)i < Hi,m2 iiv^-iiioo + E E I'pm/m . 

V m=0 e=o / 

Let us define the set 

Im{t) = {0 < £ < 2" - 1 : / 0}. 

Since, the wavelet ip is compactly supported, one has that the cardinality of Imit) is bounded 
by a constant K > that is proportional to the length support of ip. Thus, using the relation 
IIV'aIIoo = 2™-/^ 111/' 1 1 00, we obtain that, for any t gT, 



m2 



/ iri2 
< fl,,,m2 ||V^-l||oo+ 5]i^2'"/2| 



|oo 

m=0 / 



< ^l,,,m2 (||^-l||oo + i^||V5||oo2^(™^ + l)' 

< (||^_i|U +i^||V;||oo) 25(-2+l). 

Therefore, by the definition of ^ji,m2 given in (|5.2p . it follows that 

sup{||/.(t,-)li;i,,J < (||V5-i||oo + if||V;||oo)/.i„m.2^(-^+^)2(^-^+^)(^^+'^/2) 
teT 

< c(||^_i||oo+if||V^||oo). (5.4) 
Now, define, for each A G A and t € T, 

1712 2™-! 
m=-l 1=0 

with w = {{wx,rn,i)\x\=jfi<t<2^-i).^_^^^^^^.^,^^_^^^^^^^^ and lij^,m2 are as given above. On noting 
that 

and using the definition of /Uj-j^^a given in ()5.2p and the inequality (jS.lD . we obtain that, for any 
|A| = j with -1 < j < ji, 

llrv xll^'a < . o{m2+l){s2+l/2) 
1 1 "■!<;, A I |p2, 92 — ^''Ji,ni2'^ 

< c2-i(j'i+i) < cAaM. (5.5) 
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Hence, using the inequalities (|5.4p and (|5.5p . it follows that the condition (|5.3p is sufficient to 
imply that G B;^^^2(yli,A2). 

In the rest of the proof, we will thus assume that condition (j5.3p holds. Furthermore, we 
use the notation Ej^ to denote expectation with respect to the distribution Pj^ of the random 
process Y in model (II. ip under the hypothesis that / = f^. 

The minimax risk TZ^(Bp^^^^^ (Ai, A2)) can be bounded from below as follows 
7^, :=7^,(B^^^^^^l,^2)) > infsupi?(/„/^) 

ji ma 2""-! 

> infsup ^ ^ ^f^l^Xm/- Humi^^m/l 

\X\=j m=-l £=0 

where 

^\me= feit,x)tpm,e{t)iJ\{x)dtdx. 

Then, define 



wx,m,e ■= argmm la^^^^f, - fJ,j^,m2V\ , 
ve{-i,i} 

and remark that the triangular inequality and the definition of w^^ ^ ^ imply that 
which yields 

7^e > infsup^^^^ Yl ^ ^f^.\^x,m,i-wx-m,ef 

2 , il "12 2^-1 

2 'f^siS E E E E»^/.KW-»wr. 

""^ w&Vlj=-l\\\=jm=-l 1=0 

Replacing the sums Ef=-i E|A|=j Emi-i E^Io"^ ^y ^A.m.^ simplify the notation), for 
any A, m, £ and t(7 G define the vector w^^'"^'^^ E having all its components equal to w expect 
the (A,m,^)-th element. Let i^A denote the cardinality of a finite set A. Then 

> inf ^^'^^ Yl Yl (%.J^W-^A,m,^P + %^(;,.,„,,) |^i'W-iyi'^;^/^M 



,2 



(A,m,£) 



. ■ r /^ii,ni2 1 w / I |2 , -e (A,m,£) ^ ^'^/„(A,m,£) 
- '?^^^^#f^Z^ FA,m,£-«^A,m,£| + W^A,m,^-<™/ (^) 



Since w\l^l = —w\,m,£ and w\j^^£ {"li 1}) one finally obtains that, for any < 6 <1, 



2 1 ^ 

ji,m2 /^j^ ^ ^ ( I 

"^h,m2 Z-^ f^' \ dFf 



> <^A^Lm™>: >: F/j ':"^ (n>^ • (5.6) 
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Thanks to the multiparameter Girsanov's formula, one has that, under the hypothesis that 
f = in model 1^^, 

"^-^^{y)] = [ {f^(K^.i)-fJ{t,x)dW{t,x)-'-^ I {f^iK^,,-fj\t,x)dtdx 



log I 

Therefore, the random variable 



is Gaussian with mean 9 = ^2 ™^ variance o"^ = e~'^^^j^ that do not depend on 

(A,m,£). 

Now, let s > satisfy (j4.4p . Define ji = ji(e) and m2 = m-2(e) as 

2{ii(f)+i) = |^g-(2S+3Ti>7j 2('^2('=)+i) = l^g- (2.+d+i).2 J . (5.7) 

Thanks to (I5.2p . it follows that there exists ci > such that 

for all sufficiently small e > 0. Hence, Zx^„i/ ~ -^(^jO"^) with |0| < ci/2 and cj^ < ci which 
implies that there exist 7 > and < 5 < 1, that do not depend on {X,m,i) and e, such that 
for all sufficiently small e > 



dFf 



^(y)>iog(5) >7. 



Hence, inserting the above inequality into (j5.6p . it implies that 

Using the expressions of ji(e) and m2(e) given in (|5.7p . together with (j4.4p and (j5.2p . we finally 
obtain that there exists a constant C > 0, that does not depend on e, such that 

Tie > Ce2»+d+i, 

for all sufficiently small e > 0, thus completing the proof of the theorem. 

□ 



6 Minimax upper bound for the L^-risk 

We now provide minimax upper bounds for the L^-risk. This will accomplished by constructing 
appropriate estimators of f{t,x), t €z T, x£ X, in the sequence space model. 
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6.1 The sequence space model 

The suggested estimators in the following sections, will be constructed on the sequence space. 
Let us first recall that (see, e.g., [Chow et al., 20dT] ) (jl.ip must be understood in the following 
sense: for any g € L^(T x ^), 

/ K{t,x)dY{t,x) = g{t,x)f{t,x)dtdx + e g{t,x)dW{t,x) 

JTxX JTxX JTxX 

SO that the integrand of "the data" dY{t^x) with respect to g(t,x) is a random variable that is 
normally distributed with mean 

IE( / g{t,x)dY{t,x)\ = I g{t,x)f{t,x)dtdx 

\JtxX J JTxX 

and variance 

Var( / g{t,x)dY{t,xy\ = <? \ \g{t,x)f dtdx. 

\jTxX J JTxX 

Moreover, for any (71,(72 G L^(T x Af) 

e(/ gi{t,x)dW{t,x) g2{t,x)dW{t,x)] = gi{t,x)g2{t,x)dtdx. 

\JTxX JTxX J JTxX 

Hence, in view of the above and using the tensor product wavelet basis constructed in Section 
[3l noisy observations of the coefficients Ci\^rn,i are thus obtained through the following sequence 
model 

y\,m,i = / 'lijmAi)^^i^)dYit,x) (6.1) 
JTxX 

= a\^rn,£ + e zx^m/, A G A, m > -1, ^ = 0, 1, . . . , 2™ - 1, 

where the zx^jn^s are independent and identically distributed (i.i.d.) standard Gaussian random 
variables, i.e., Gaussian random variables with zero mean and variance 1. 

6.2 Linear and non-adaptive estimator 

Consider the sequence space model (16. 2p . Let ji > and m2 > be integers (smoothing 
parameters). We consider the following non-adaptive wavelet projection (linear) estimator of 
f{t,x), t £ T, X ^ X , that is 

^ ji m2 2-^-1 

fn,m2i^^^)=J2 J2 ^ y\m/'4}m/{t)il^\{x), t € T, X^X. (6.2) 

j=-l\X\=jm=-l £=0 

Define the L^-risk of f j^^rn2 

^(/ji,m2'/) = ^Il/ji,m2 ~ •^IIl2(TxA^) 



E 



/ /7i,"i2(*'^) ~ /(*'^) dtdx]. 
JTxX J 



The following statement provides the minimax upper bounds for the L -risk of the non- 
adaptive (linear) wavelet estimator /jj^m2 given in (|6.2p . 
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Theorem 2. Let > and A2 > be constants. Let si > and S2 > be the smoothness 
parameters in the space and time domains, respectively, such that < si < ri and < S2 < T2, 
where ti and T2 are the regularity parameters of the wavelet systems {(p, ip) and {(p, -0), respectively. 
Assume that 2 < pi,qi < +00, 2 < P2,Q2 ^ +00 such that si + d{l/2 — 1/pi) > and 
S2 + 1/2 — l/p2 > 0, and let s > satisfy (j4.4p . Consider the linear estimator f given in 
(j6.2p . and define ji = Ji(e) and m2 = m2{e) such that 



2(iiW+i) = (2»+d+i)sij and 2(™2W+i) = (6.3) 
Then, there exists a constant C > such that 

sup R{f h{e),m2{e)J) < C e^^, 

for all sufficiently small e > 0. 

Proof. Let us write the usual bias-variance decomposition of the L^-risk as 
with 

BCfn,m,,f) = mfn,m,-ff and = Ell/^,^, - E/^.^,„J|2. 

Obviously, 



^(/k^J = E E E E 

i=-l |A|=im=-l £=0 
= g22{ji + l)rf+m2 + l^ |-g^^-j 

^Cfjumi^f) = -Sl(/ii,m2'/) + -S2(/ji,m,2'/)' 



ii m2 2™-! 



and 
where 



BiC/n,m,,f) = E E E E 



+00 +00 2™-l 



j=ji+l |A|=jm=--l £=0 

00 



^ J] / Mt)\'dt 
j=n+i\x\=j-^T 

„ 00 

/ E ^Mt)\'dt 



i=ii+l|A|=i 



and 



Wn,.n.,f) = E E E E i«A 

j=-l \X\=j m=m2+l e=0 
+00 +00 2™-l 

^ E E E E 

j=-l |A|=jm=m2+l £=0 



ji +00 2™-l 

|2 
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By Lemma [H there exists a constant K2 > (only depending on S2 and P2) such that 

+00 2™-l 
m=m2+l £=0 

Thus, using (j4.3p . it follows that 

S2(/-„^„/)<i^2^p-2(-^+i>^ (6.5) 

Moreover, by Lemma [H there exists a constant ivTi > (only depending on si and pi) such 
that 

+00 



J2 Ei"A(i)i'<i^i^?2-2(^'^+'^^s 

i=ii + l |A|=i 



This implies that 



BC/n,m, , f) < i^i ^22-201+1)^1 + K2^22-2(™2+i)«2 . (6.6) 
Therefore, by combining (j6.4p . (|6.5p and (j6.6p . we arrive at 



RCfn(e),m,(e)J) < A?2-2{il W + + ^22-2(^2 (^) + l)-2 + g22{il + (e) + l) ^ 

By taking into account the expressions of Ji(e) and m2(e) given in (j6.3p . together with (|4.4p . we 
finally obtain that there exists a constant C > 0, that does not depend on e, such that 

sup R{f n{e),ra2{e)J) < C , 

for all sufficiently small e > 0, thus completing the proof of the theorem. □ 

The choice of the resolution levels ji and m2 depends on the unknown smoothness parameters 
si and S2 in the space and time domains, respectively. The linear estimator f j^^rn2 defined in 
(j6.2p is thus called non-adaptive (with respect to si and S2) and is of limited interest in practical 
applications. Moreover, the results of Theorem [2] are only suited to model d-dimensional functions 
/(t, •) belonging to the space ^^(Ai) with 2 < pi,qi < +00, uniformly over t G T. However, 
such Besov spaces are not suited to model spatially inhomogeneous multivariate functions. 

In the following section, we thus consider the problem of constructing an adaptive non- 
linear estimator that is optimal (in the minimax sense) over Besov balls B*i^*^(^i, ^2) with 
1 < < +00 and 1 < P2,q2 < +oo- 

6.3 Non-linear and adaptive estimation 

Consider the sequence space model (j6.2p . For each A G A, we divide the wavelet coefficients 
ct\,m,e at each resolution level — 1 < m < -|-oo into blocks of length = 1 -|- [log(e-2)J . Let Am 
and Umr be the following sets of indices 

Arr,. = ir I r = 1,2,... 



Umr = {^K = 0,l,...,2™-l;(r-l)Le <^<rL, -1}. 
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Now, we define 

Bx,m,r = Yl ^\ral and Bx^m.r = ^ vl^m/- (6-7) 



We consider an adaptive wavelet block-thresholding (non-linear) estimator f{t,x), t £ T, xG X , 
that is 

^ jl m2 

fh,m2it^^) = Y Yl Yl Yl yw^{BA,™,.>i.,.}'^'"'^(*)^^(3i), teT, xex, 

i=-l |A|=jr- m=-l reA„ £eC/mr 

(6.8) 

where lyi is the indicator function of the set A, and the resolution levels ji and m2, and the 
threshold t^^^, will be defined below. 

Define the L^-risk of f j^^rn2 as 

n/i"' j!\ iT^lli"'^ J?ll2 

^\fjl,m2'J) — ^Il/ji,m2 ~ /IIl2(TxA') 



E 



f -nl \ 

/ fn,m2i^:X) - f{t,x) dtdx). 
JTxX J 



The following statement provides the minimax upper bounds for the L -risk of the adaptive 
(non-linear) wavelet estimator fj^^m2 given in (16. Sp . 

Theorem 3. Let Ai > and A2 > be constants. Let si > and S2 > be the smoothness 
parameters in the space and time domains, respectively, such that < si < ri and < S2 < T2, 
where ti and T2 are the regularity parameters of the wavelet systems (0, ip) and {4>, respectively. 
Assume that 1 < pi,qi < +00, 1 < P2,Q2 < +00 such that si + d{l/2 — 1/pi) > and 
S2 + 1/2 — l/p2 > if 2 < pi,qi < +00 and 2 < P2,q2 ^ +00, respectively, and si > d/pi and 
S2 ^ 1/P2 */l ^ Pi) ^Zi < 2 and 1 < P2) (72 < 2, respectively. Let also s > satisfy (14. 4p . Consider 
the non-linear estimator f j^^rn2 5*^671 in (|6.8p . and define ji = ji(e) and m2 = m,2(e) as 

2(iiW+i) = Le-2j and 2(™2(^)+i) = [e^'j . (6.9) 

Define the threshold 

te,S = S ^ L^, 

for some 6 > 2(2-v/2 + 1)- Then, there exists a constant C > such that 

sup RCft{e),m2{e), f)<C 6^1, 

for all sufficiently small e > 0. 

Proof. From Parseval's equality, we can decompose the L^-risk of f j-^^m2 as follows 

- nl 

R{fj,,m2^f) = Bi+B2 + Rl + R2, 
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where 



5i 



+00 +00 2™-l 00 

^ E E E Ei«wi'=/ E Ei"A(i)i'dt, 

i=ii + l |A|=jm=-l ^=0 •^'^ j=ii+l |A|=j 

ii +00 2"^-! 

^2 = E E E E l"A,m,d^ 

j=-l |A|=j m=m2+l £=0 
ii ni2 

^1 = E E E E E ^(iyw-«A,m,^i' VA,™,.>t.4 

j=-l |A|=j m=-l reAm £e(7mr 
ii m2 

E E E E E '^(i°^..»/i{B._<..,,})- 

j=-l |A|=j m=-l reAm teC/mr 



To bound the risk, we need to control the terms Bi, B2, Ri and R2- Let p'^ = min(pi,2) and 
p'2 = min(p2, 2). Define also s'^ = si + d{l/2 — 1/p'i) and s'2 = S2 + 1/2 — 1/^2. 

By Lemma [H there exists a constant K[ > 0, only depending on si and pi, such that 

+00 

j=ji+l \X\=j 

implying that 

Also, by Lemma [U there exists a constant K2 > 0, only depending on S2 and p2, such that 

+00 2™-l 
m=m2 + l £=0 



implying, in view of equation (|4.3p . that 

B2 < i^^A|2-2(™2+i)4. 

Consider the case 2 < pi < +00 implying that s\ = Si. Thanks to the definitions of Ji(e) given 
in (|6.9p and s given in (j4.4p . we obtain that 

Bi = o( e2s+d+i 

In the case 1 < pi < 2, the condition si > d/pi, the definitions of ji(e) given in (16. 9p and s given 
(j4.4p also imply that 

Consider the case 2 < p2 ^ +00 implying that S2 = S2- Thanks to the definitions of rn-2(e) given 
in (|6.9p and s given in (j4.4p . we obtain that 

/ 4a 
^2=0 e2s + d+l 
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R 



1,1 



In the case I < p2 < 2, the condition S2 > 1/P2) the definitions of m2{e) given in (|6.9p and s 
given in (j4.4p also imply that 

/ 4s 
B2 = o( e2s+d+l 

Let us now write Ri and R2 as the sum of two terms 

Ri = Ri,i + Ri,2 

where 

EE E E E I iJ, 

i=-l|A|=jm=-lreA™te{/„r ^ * - J/ 

ji mi 

^1.2 = EE E E E ^(iyw-«A,m,d')a{B,,„,,>it,4> 

i=-l |A|=i m=-l re Am ^eC/m,- 

^2,1 = EE E E E 11^(1" If I - isi, 1), 
^2,2 = E E E E E i"wi'a{B,,m,,<|t,,4> 

j=-l |A|=j ■m=-l reA„ £G6f,nr 
where we have used the inequality \y\m^t\^ < 2 |yA,m,^ ~ OL\,m,t^ + 2 |ciA,m,^|^- 

Let us first give an upper bound for Ai = R\;y + R2,\ as follows. Using Cauchy-Schwarz's 
inequality, moments properties of Gaussian random variables, Lemma [1] and Lemma [21 we have 



f / \ 1/2 \ 

Al < E E E E E \\y\rn/ - 0.x^mA) + l"A,m,^N 

j = -l\X\=jm=-lr<^Aml<^Umr ^ ^ 



< V3 2(^'i+i)'^+('"2+i)e2+ ^2^i2-'"^2 I e^('5/2-i)^ 

\ i=-l|A|=jm=-l 
= 0(^2(-''i+^)'^+("'2+l)g2+i(<5/2-l)2 _^gi(<5/2-l) = 

= O(e-), 

where we have used the assumption that 6 > 2(2\/2 + !)■ 
Now, let A2 = Ri^2 + R2,2- Let Jq and mg be defined as 



2^0 + 1 _ {2s+d+l)s[ J g^j-^J 2^0 + 1 — (2s+d+l)s'2 J 

where s'^ = si + d{l/2 — 1/p'i) and S2 = S2 + 1/2 — l/p2- Note that —1 < Jo < Ji and 
— 1 < mo < m-2 for all sufficiently small e > 0. Then, A2 can be partitioned as A2 = A2 1 + A2 2 
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where the first component A2,i is calculated over the indices —1 < j < jo and — 1 < m < mo, 
namely 



JO 



mo 



A 



2,1 



EE E 

j=-l |A|=jm=-l 



■2m _ I 



£=0 



and the second component A2 2 is calculated over the remaining indices, namely 



■2™-! 



A^.^ = E E E 

i=io+l |A|=im=-l 
io m2 

+ EE E 

J=-l |A|=j m=mo+l 





2'"-! 



51 ^(lyW-aA,m,^|') a{B,^_^>it^_,}+ Yl ^Wl{B,,^,.<|t,,4 



€=0 



Let us first give an upper bound for A2,i as follows 



JO 



mo 



A„ < E E E 

j=-l\\\=jm=-l 



■2'"-l 



L £=0 



reAr, 



O 



where we have used the moments properties of Gaussian random variables, and the fact that the 
blocks Am are of length L^. 

Now, we compute an upper bound for A2,2-We have 



Jl 



m2 



i2,2 



s E E E 

j=jo+l |A|=jm=-l 



JO 



m2 



+ EE E 

j=-l \X\=j m=mo+l 



2 ^ 1, 



Noticing that 5 > 2, we see that X^^g^^ ^ieUmr — 4^e,<5' which implies 



^^>^ ^ 2 5: X 

J=J0+1 |A|=jm=-l 



Yj -^A,m,r 



JO 



+2EE E 

J=-l |A|=j m=mo+l 



.reAm 
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Then, noticing that YlreAm ^>^,'m,r = J2e=o |o^A,m/| i by Lemma [T] we have 

ji P \ / jo m2 2^-1 

A2,2 = 01 V V / \c.m'dt\+o\ V yai V V K^,/ 



Vi=io+l / yj=-l|A|=j m=mo+l 



2msL 



using the definition of j'q and mg. This completes the proof of the theorem. □ 

7 Numerical experiments 

We now ihustrate the usefulness of the adaptive nonlinear wavelet estimator described in Section 
16.31 with the help of simulated and real-data examples. The overall numerical study presented 
below has been carried out in the Matlab 7.7.0 programming environment. 

7.1 Simulated data 

We have used as a synthetic 2-dimensional (2D) example the Shepp-Logan phantom image (see 



Jain, 1989] ) of size N x N , with = 64 displayed in Figure 17.3( a). This image is made of 



piecewise constant regions with different shape that partition the N x N pixels into 6 regions 
represented by different colors in Figure [7^ a). To each pixel of a given region, we associate 
a one-dimensional (ID) signal of length n = 128. In this way, we are able to create a time- 
dependent 2D image ( f(te,xiu, ) that can be considered as the discretization 

of a function / : [0, 1] x [0, 1]^ ^ M, with t, = ^ and ^(fc^,^^) = ^) . 
Then, we have created noisy data from the model 

^A(fci,fe2) = /(*^^(fci,fc2)) + ^^^,(fci,fc2)' 1 < ki,k2 < N, (7.1) 

where the 'U^£^(fci^fc2)'^ i.i.d. standard Gaussian random variables, and cr^ > is the variance 
in the measurements ranging from a low to a high level in the simulations (we took signal- 
to-noise ratios equal to 7, 5 and 3). It is well known in nonparametric statistics (see e.g. 
Brown and Low, 1996| ) that there exists an asymptotic equivalence (in Le Cam sense) between 



the regression model (17. ip on nN'^ equi-spaced points, for each fixed t £ T, and the white noise 
model (II. ip . when taking e = Therefore, thanks to this asymptotic equivalence, one can 

use the 2D+time dependent wavelet block thresholding approach described in Section 16.31 to 
denoise data from model (|7.ip . To show the benefits of our approach, we compare it to two other 
mehods: 

- pixel by pixel denoising based on ID wavelet thresholding: for each fixed pixel {ki,k2), we 
apply a standard ID wavelet-based denoising procedure with the universal threshold to the 
ID data {Ye,{ki,k2)) i<e<n' 
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- slice by slice denoising based on 2D wavelet thresholding: for each fixed time tg, we apply 
a standard 2D wavelet-based denoising procedure with the universal threshold to the 2D 

data {Ye,(k,,k2))i<k,M<N- 

Then, we have generated M = 100 repetitions of model (jT.ip for three different values of the 
considered signal-to-noise ratio. For each replication, the quality of an estimate / obtained by 
one of the above described methods is measured via its empirical mean squared error 

^ n N 2 
e=l A:i,fc2=l 

The results of these simulations are displayed in Figure [7?T] in the form of boxplots of the empirical 
mean squared error. Clearly, our approach yields the best results. The benefits of our method 
can also be clearly seen from the images displayed in Figure [731 which show temporal cuts of the 
various estimators for a given simulation of the model. 

7.2 Real data 

Now, we return to the real-data example on satellite remote sensing data discussed in Section [2j 
To apply the suggested adaptive nonlinear wavelet estimator, it is necessary to estimate the level 
of noise in the measurements. For this purpose, we estimate the level of noise in each 2D image 
at each wavelength using the median absolute deviation (MAD) of the empirical 2D wavelet 
coefficients at the highest level of resolution (see [Antoniadis et al., 2001| for further details on 
this procedure). Then, to apply our method, we took e = ^ "j^^ with a being the maximum of 
these estimated values by MAD over the n = 128 wavelength, with N = 64. The result of our 
denoising procedure is displayed in Figure 17.61 



20 



2D Dsnoising slice by slice Time denosing pixel by pixel 



2D Denoismg slice by slice Time denDsmg pixel by pixel 



2D Denoising slice by slice Time denosing pixel by pixel 



(a) 



(b) 



(c) 



Figure 7.4: Boxplot of the empirical mean squared (j7.2p error over M = 100 simulations from 
model (I7.ip for the three methods (from left to right: pixel by pixel denoising based on ID 
wavelet thresholding, slice by slice denoising based on 2D wavelet thresholding, 2D + time wavelet 
block thresholding) and for various values of the signal-to- noise ratio (SNR): (a) SNR = 7; (b) 
SNR = 5; (c) SNR = 3. 



8 Concluding remarks 

We considered the nonparametric estimation problem of time-dependent multivariate functions 
observed in a presence of additive cylindrical Gaussian white noise of a small intensity. We 
derived minimax lower bounds for the L^-risk in the proposed spatio-temporal model as the 
intensity goes to zero, when the underlying unknown response function is assumed to belong to 
a ball of appropriately constructed inhomogeneous time-dependent multivariate functions. The 
choice of this class of functions was motivated by real-data examples and illustrated with the 
help of an example on satellite remote sensing data. We also proposed both non-adaptive linear 
and adaptive non-linear wavelet estimators that are asymptotically optimal (in the minimax 
sense) in a wide range of the so-constructed balls of inhomogeneous time-dependent multivariate 
functions. The usefulness of the suggested adaptive nonlinear wavelet estimator was illustrated 
with the help of simulated and real-data examples. 

Some extensions of the present work are possible. They are briefly mentioned below. 

[Inverse Problems] Model (11. ip can be extended to the case where the signal is observed through 
a linear operator plus noise. More precisely, one can consider the nonparametric estimation 
problem of time-dependent multivariate functions observed through a known or unknown linear 
operator with kernel k{x, u) and in a presence of additive cylindrical Gaussian white noise, namely 

dY^{t,x)=(^j k{x,u)f{t,u)dujdx + edW{t,x), (8.1) 

where, as earlier, t G T (T is a compact subset of M) is the time variable, x£ X {X is a compact 
subset of W^, d > 1) is the space variable, / ^ lJ^(T x X) is the time-dependent multivariate 
function that we wish to estimate, dW{t,x) is a cylindrical orthogonal Gaussian random measure 
(representing additive noise in the measurements), and e > is a small level of noise, that may 
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(c) (d) 

Figure 7.5: Typical estimates obtained by the three methods with a signal-to-noise ratio 
SNR = 5 for a temporal cut at = 0.5748: (a) true image without additive noise; (b) 2D 
+ time wavelet block thresholding (our method) ; (c) slice by slice denoising based on 2D wavelet 
thresholding; (d) pixel by pixel denoising based on ID wavelet thresholding. 

let be going to zero for studying asymptotic properties. An important example of kernel is the 
case where 

k{x,u) = h{u — x) for some function /i : ^ R, 
(with known or unknown singular values) leading to a time-dependent multivariate deconvolution 
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(c) (d) 

Figure 7.6: Satellite remote sensing image (64 x 64 pixels, over 128 wavelengths): (a) a 2D image 
measured at a specific wavelength (raw data); (b) 2D image at the same wavelength obtained 
after applying our method; (c) evolution over wavelength of the intensities of the two pixels in 
green and blue shown in Figure ETT] (raw data); (d) intensities of these two pixels after denoising 
by our method. 

problem. (Note that a sub-class of this model is the case of direct noisy observations of the time- 
dependent multivariate functions f{t,x), t £ T, x G X, namely model (jl.ip considered in this 
work.) 

[Smoothness Assumption] In either model (jl.ip or model (|8.ip . instead of using the standard 
(isotropic) d-dimensional Besov spaces on X to describe the smoothness of the underlying 
unknown response function f{t,x), for each fixed t € T, one could consider anisotropic d- 
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dimensional Besov spaces on X, where different smoothness is assumed in each direction (see, 
e.g., [Kyriazis, 2004| ). Another possibihty, is to consider subclasses of the so-called decomposition 
spaces that cover both the cases of standard (isotropic) d-dimensional Besov spaces as well as, 
in the case when d = 2, smoothness spaces corresponding to curvelet-type constructions (see 
IChesneau et al., 2010| ). 

The above extensions are projects for future work that we hope to address elsewhere. 

9 Appendix 

9.1 Besov space and wavelet approximations 

Lemma 1. Let Ai > and A2 > be constants. Let si > and S2 > be the smoothness 
parameters in the space and time domains, respectively, such that < si < ri and < S2 < T2, 
where ti and T2 are the regularity parameters of the wavelet systems (0, ip) and {(p, respectively. 
Let 1 < Pi,qi < +00, 1 < P2)'?2 < +00. Assume that f G Bp^^^^'^(Ai,A2). Define 
s'l = si+d{l/2—l/p[) > withp[ = min(pi,2) and s'2 = S2 + 1/2— 1/^2 > withp'2 = min(p2,2). 
Let cx\{t) be defined as in ()4.ip . and a\^rn,l be defined as in (j4.2p . Then, for every —1 < j < +00, 

Y,Mt)\'<K[Al2-'^^'^, 

|A|=J 

for some constant K[ > 0, only depending on si andpi, and for every —1 < m < +00, 

2™-l 

EI;t |2 ^ 7^/ 4 2r)-2ms', 

\a\,m,e\ < J<2^\^ ■> 

£=0 

for some constant K2 > 0, only depending on S2 and p2. 

Proof. Since, for each t £ T, f{t,-) G Bp^ij^{Ai) with 1 < pi,qi < +00, using standard 
embedding properties of Besov spaces, there exists a constant K[ > 0, only depending on si 
and pi, such that for every — 1 < j < +00 

Mt)\' < K[Al2-'^^K 

|A|=i 

By the definition of Bp^^''2(^i, ^2), and using standard embedding properties of Besov spaces, 
there exists a constant K2 > 0, only depending on S2 and p2, such that for every — 1 < m < +00 

l«A,n.,.l'<i^242-2-4. 

This completes the proof of the lemma. □ 
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9.2 A large deviation inequality 

Lemma 2. Let 6 > 1. Then, for any —1 < m < +oo and r G A^, 



\y\,m/ — Ot\^rn,l\ 



(9.1) 



Proof. The proof is inspired by the arguments used in the proof of Lemma 2 in 
[Pensky and Sapatinas, 2009| . Consider the set of vectors 



G M \ {0} : 5] < 1 I , 



and the centered Gaussian process defined by 



£■ G Um , r 



By Lemma 5 in [Pensky and Sapatinas, 2009| , we need to find upper bounds for 
E (^sup^gf^^^^ Zm,riv)^ and sup^^Q^^Yai {Zm,riv)). By the Cauchy-Schwartz inequality, 

sup Zm,riv) = sup ^ Vmiyx,m,£ - ax,m,l) 



1/2 



iyx,m,i - OA 

^ £^U m.r 



Furthermore, Jensen's inequahty impHes that 



E [ sup Z^,r{v)] = E (y 



1/2 



X,m,e - OiX^m/Y 



1/2 

< I Yl Hyx,m,e-ax,n.,ef\ = eLV\ 

By independence of yx,m,e - ax,m,e and yx,m,e' - ax,m,e' for £ 7^ f , we obtain 

sup YaT{Zm,r{v)) = sup ^ vj\ax{yx,m,i - ax,m/f < ■ 



Thus, by Lemma 5 in [Pensky and Sapatinas, 2009 , one has that 

{yx,m/ — ax,m,e. 



>x + eLV^ I <exp( --^ 



V 



for any x > 0. Finally, by taking x = {5 — l)eLj , we arrive at (j9.ip . thus completing the proof 
of the lemma. □ 
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